Skip to content

Add live audio transcription streaming support to Foundry Local JS SDK#486

Open
rui-ren wants to merge 2 commits intomainfrom
ruiren/audio-streaming-support-sdk-js
Open

Add live audio transcription streaming support to Foundry Local JS SDK#486
rui-ren wants to merge 2 commits intomainfrom
ruiren/audio-streaming-support-sdk-js

Conversation

@rui-ren
Copy link

@rui-ren rui-ren commented Mar 5, 2026

Here's the updated PR description with the renamed types:


Title: Add live audio transcription streaming support to Foundry Local JS SDK

Description:

Adds real-time audio streaming support to the Foundry Local JS SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI ASR.

The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionClient that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async iterable.

What's included

New files

  • src/openai/liveAudioTranscriptionClient.ts — Streaming client with start(), pushAudioData(), getTranscriptionStream(), stop(), dispose()
  • src/openai/liveAudioTranscriptionTypes.tsLiveAudioTranscriptionResult and CoreErrorResponse interfaces, tryParseCoreError() helper

Modified files

  • src/imodel.ts — Added createLiveTranscriptionClient() to interface
  • src/model.ts — Delegates to selectedVariant.createLiveTranscriptionClient()
  • src/modelVariant.ts — Implementation (creates new LiveAudioTranscriptionClient(modelId, coreInterop))
  • src/index.ts — Exports LiveAudioTranscriptionClient, LiveAudioTranscriptionSettings, LiveAudioTranscriptionResult, CoreErrorResponse

API surface

const audioClient = model.createAudioClient();
const session = model.createLiveTranscriptionClient();

session.settings.sampleRate = 16000;
session.settings.channels = 1;
session.settings.language = "en";

await session.start();

// Push audio from microphone callback
await session.pushAudioData(pcmBytes);

// Read results as async iterable
for await (const result of session.getTranscriptionStream()) {
    console.log(result.text);
}

await session.stop();

Design highlights

  • Internal async push queue — Bounded AsyncQueue<T> serializes audio pushes from any context (safe for mic callbacks) and provides backpressure. Mirrors C#'s Channel<T> pattern.
  • Retry policy — Transient native errors retried with exponential backoff (3 attempts); permanent errors terminate the session
  • Settings freeze — Audio format settings are snapshot-copied and Object.freeze()d at start(), immutable during the session
  • Buffer copypushAudioData() copies the input Uint8Array before queueing, safe when caller reuses buffers
  • Drain-on-stopstop() completes the push queue, waits for the push loop to drain, then calls native stop
  • Dispose safetydispose() wraps stop() in try/catch, never throws

Native core dependency

This PR adds the JS SDK surface. The 3 native commands (audio_stream_start, audio_stream_push, audio_stream_stop) are routed through the existing execute_command / execute_command_with_binary exports. The code compiles with zero TypeScript errors without the native library.

Testing

  • ✅ TypeScript compilation — 0 errors across all source files
  • ⏳ Integration tests pending native core delivery

Parity with C# SDK

This implementation mirrors the C# LiveAudioTranscriptionSession (branch ruiren/audio-streaming-support-sdk) with identical logic:

  • Same session lifecycle: startpushgetStreamstop
  • Same push loop with retry and permanent error handling
  • Same settings freeze and buffer copy semantics
  • Same drain-before-stop ordering
  • Same renamed types: LiveAudioTranscription* (matching C# rename)

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 13, 2026 9:41pm

Request Review

@rui-ren rui-ren changed the title Add real-time audio streaming support (Microphone ASR) - JS Add live audio transcription streaming support to Foundry Local JS SDK Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant